Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在3D视觉中,视觉重新定位已被广泛讨论:鉴于预构建的3D视觉图,估计查询图像的6 DOF(自由度)姿势。大规模室内环境中的重新定位可实现有吸引力的应用程序,例如增强现实和机器人导航。但是,当相机移动时,在这种环境中,外观变化很快,这对于重新定位系统来说是具有挑战性的。为了解决这个问题,我们建议一种基于虚拟视图综合方法Rendernet,以丰富有关此特定情况的数据库和完善姿势。我们选择直接渲染虚拟观点的必要全局和本地特征,而不是渲染需要高质量3D模型的真实图像,并分别将它们应用于后续图像检索和功能匹配操作中。所提出的方法在很大程度上可以改善大规模室内环境中的性能,例如,在INLOC数据集中获得7.1 \%和12.2 \%的改善。
translated by 谷歌翻译
尽管基于深度学习的单眼行人检测方法取得了长足的进步,但它们仍然容易受到沉重的阻塞。使用多视图信息融合是一个潜在的解决方案,但由于缺乏注释的培训样本,因此应用程序有限,因此可以增加过度拟合的风险。为了解决这个问题,提出了一种数据增强方法,以随机生成3D圆柱体阻塞的地面平面,该缸的平均规模是行人的平均大小,并预测了多种视图,以减轻训练过度拟合的影响。此外,每个视图的特征映射都通过使用同符,将每个视图的特征图投影到不同高度的多个平行平面,这使CNN可以充分利用每个行人高度上的特征来推断地面上的行人位置。与最先进的基于深度学习的方法相比,提出的3Drom方法具有大大提高的性能。
translated by 谷歌翻译
运动,作为视频中最明显的现象,涉及随时间的变化,对视频表示学习的发展是独一无二的。在本文中,我们提出了问题:特别是对自我监督视频表示学习的运动有多重要。为此,我们撰写了一个二重奏,用于利用对比学习政权的数据增强和特征学习的动作。具体而言,我们介绍了一种以前的对比学习(MCL)方法,其将这种二重奏视为基础。一方面,MCL大写视频中的每个帧的光流量,以在时间上和空间地样本地样本(即,横跨时间的相关帧斑块的序列)作为数据增强。另一方面,MCL进一步将卷积层的梯度图对准来自空间,时间和时空视角的光流程图,以便在特征学习中地进行地面运动信息。在R(2 + 1)D骨架上进行的广泛实验证明了我们MCL的有效性。在UCF101上,在MCL学习的表示上培训的线性分类器实现了81.91%的前1个精度,表现优于6.78%的训练预测。在动力学-400上,MCL在线方案下实现66.62%的前1个精度。代码可在https://github.com/yihengzhang-cv/mcl-motion-focused-contrastive-learning。
translated by 谷歌翻译
我们介绍了三级管道:调整多样化输入(RDIM),多样性集合(DEM)和区域配件,共同产生可转移的对抗性示例。我们首先探讨现有攻击之间的内部关系,并提出能够利用这种关系的RDIM。然后我们提出DEM,多尺度版本的RDIM,生成多尺度梯度。在前两个步骤之后,我们将价值转换为迭代拟合的区域。 RDIM和区域拟合不需要额外的运行时间,这三个步骤可以充分集成到其他攻击中。我们最好的攻击愚弄了六个黑匣子防御,平均成功率为93%,这均高于最先进的基于梯度的攻击。此外,我们重新思考现有的攻击,而不是简单地堆叠在旧的旧方法上以获得更好的性能。预计我们的调查结果将成为探索攻击方法之间内部关系的开始。代码在https://github.com/278287847/DEM中获得。
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译